433 research outputs found

    Gene Expression : From Microarrays to Functional Genomics

    Get PDF
    The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease

    CpGmotifs : a tool to discover DNA motifs associated to CpG methylation events

    Get PDF
    BackgroundThe investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena.ResultsHere we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs.ConclusionsCpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs.Peer reviewe

    IntEREst: Intron-exon retention estimator

    Get PDF
    Background: In-depth study of the intron retention levels of transcripts provide insights on the mechanisms regulating pre-mRNA splicing efficiency. Additionally, detailed analysis of retained introns can link these introns to post-transcriptional regulation or identify aberrant splicing events in human diseases. Results: We present IntEREst, Intron-Exon Retention Estimator, an R package that supports rigorous analysis of non-annotated intron retention events (in addition to the ones annotated by RefSeq or similar databases), and support intra-sample in addition to inter-sample comparisons. It accepts binary sequence alignment/map (.bam) files as input and determines genome-wide estimates of intron retention or exon-exon junction levels. Moreover, it includes functions for comparing subsets of user-defined introns (e.g. U12-type vs U2-type) and its plotting functions allow visualization of the distribution of the retention levels of the introns. Statistical methods are adapted from the DESeq2, edgeR and DEXSeq R packages to extract the significantly more or less retained introns. Analyses can be performed either sequentially (on single core) or in parallel (on multiple cores). We used IntEREst to investigate the U12- and U2-type intron retention in human and plant RNAseq dataset with defects in the U12-dependent spliceosome due to mutations in the ZRSR2 component of this spliceosome. Additionally, we compared the retained introns discovered by IntEREst with that of other methods and studies. Conclusion: IntEREst is an R package for Intron retention and exon-exon junction levels analysis of RNA-seq data. Both the human and plant analyses show that the U12-type introns are retained at higher level compared to the U2-type introns already in the control samples, but the retention is exacerbated in patient or plant samples carrying a mutated ZRSR2 gene. Intron retention events caused by ZRSR2 mutation that we discovered using IntEREst (DESeq2 based function) show considerable overlap with the retained introns discovered by other methods (e.g. IRFinder and edgeR based function of IntEREst). Our results indicate that increase in both the number of biological replicates and the depth of sequencing library promote the discovery of retained introns, but the effect of library size gradually decreases with more than 35 million reads mapped to the introns.Peer reviewe

    IntEREst: Intron-exon retention estimator

    Get PDF
    Background: In-depth study of the intron retention levels of transcripts provide insights on the mechanisms regulating pre-mRNA splicing efficiency. Additionally, detailed analysis of retained introns can link these introns to post-transcriptional regulation or identify aberrant splicing events in human diseases. Results: We present IntEREst, Intron-Exon Retention Estimator, an R package that supports rigorous analysis of non-annotated intron retention events (in addition to the ones annotated by RefSeq or similar databases), and support intra-sample in addition to inter-sample comparisons. It accepts binary sequence alignment/map (.bam) files as input and determines genome-wide estimates of intron retention or exon-exon junction levels. Moreover, it includes functions for comparing subsets of user-defined introns (e.g. U12-type vs U2-type) and its plotting functions allow visualization of the distribution of the retention levels of the introns. Statistical methods are adapted from the DESeq2, edgeR and DEXSeq R packages to extract the significantly more or less retained introns. Analyses can be performed either sequentially (on single core) or in parallel (on multiple cores). We used IntEREst to investigate the U12- and U2-type intron retention in human and plant RNAseq dataset with defects in the U12-dependent spliceosome due to mutations in the ZRSR2 component of this spliceosome. Additionally, we compared the retained introns discovered by IntEREst with that of other methods and studies. Conclusion: IntEREst is an R package for Intron retention and exon-exon junction levels analysis of RNA-seq data. Both the human and plant analyses show that the U12-type introns are retained at higher level compared to the U2-type introns already in the control samples, but the retention is exacerbated in patient or plant samples carrying a mutated ZRSR2 gene. Intron retention events caused by ZRSR2 mutation that we discovered using IntEREst (DESeq2 based function) show considerable overlap with the retained introns discovered by other methods (e.g. IRFinder and edgeR based function of IntEREst). Our results indicate that increase in both the number of biological replicates and the depth of sequencing library promote the discovery of retained introns, but the effect of library size gradually decreases with more than 35 million reads mapped to the introns.Peer reviewe

    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

    Get PDF
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.Peer reviewe

    Knowledge Generation with Rule Induction in Cancer Omics

    Get PDF
    The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.Peer reviewe

    Strong conservation of inbred mouse strain microRNA loci but broad variation in brain microRNAs due to RNA editing and isomiR expression

    Get PDF
    Diversity in the structure and expression of microRNAs, important regulators of gene expression, arises from SNPs, duplications followed by divergence, production of isomiRs, and RNA editing. Inbred mouse strains and crosses using them are important reference populations for genetic mapping, and as models of human disease. We determined the nature and extent of interstrain miRNA variation by (i) identifying miRNA SNPs in whole-genome sequence data from 36 strains, and (ii) examining miRNA editing and expression in hippocampus (Hpc) and frontal cortex (FCx) of six strains, to facilitate the study of miRNAs in neurobehavioral phenotypes. miRNA loci were strongly conserved among the 36 strains, but even the highly conserved seed region contained 16 SNPs. In contrast, we identified RNA editing in 58.9% of miRNAs, including 11 consistent editing events in the seed region. We confirmed the functional significance of three conserved edits in the miR-379/410 cluster, demonstrating that edited miRNAs gained novel target mRNAs not recognized by the unedited miRNAs. We found significant interstrain differences in miRNA and isomiR expression: Of 779 miRNAs expressed in Hpc and 719 in FCx, 262 were differentially expressed (190 in Hpc, 126 in FCx, 54 in both). We also identified 32 novel miRNA candidates using miRNA prediction tools. Our studies provide the first comprehensive analysis of SNP, isomiR, and RNA editing variation in miRNA loci across inbred mouse strains, and a detailed catalog of expressed miRNAs in Hpc and FCx in six commonly used strains. These findings will facilitate the molecular analysis of neurological and behavioral phenotypes in this model organism.Peer reviewe

    Description of a low-cost picture archiving and communication system based on network-attached storage

    Get PDF
    High costs for installing, maintaining, and updating a standard picture archiving and communication system (PACS) can be prohibitive for small/medium-sized veterinary facilities. The aims of this prospective, exploratory study were to describe the design, implementation, and author experiences for 1 year's use of a low-cost PACS based on network-attached storage. The system described here was easily installed and resiliently stored redundant copies of data. It excellently balanced data recovery, system speed, security, and available memory for storage. A virtual private network also allowed off-site data review. This system can also be used for future off-site backup of data in the cloud

    Network Analysis of Microarray Data

    Get PDF
    DNA microarrays are widely used to investigate gene expression. Even though the classical analysis of microarray data is based on the study of differentially expressed genes, it is well known that genes do not act individually. Network analysis can be applied to study association patterns of the genes in a biological system. Moreover, it finds wide application in differential coexpression analysis between different systems. Network based coexpression studies have for example been used in (complex) disease gene prioritization, disease subtyping, and patient stratification.Peer reviewe
    • …
    corecore